We study the problem of planning under model uncertainty in an online meta-reinforcement learning (RL) setting where an agent is presented with a sequence of related tasks with limited interactions per task. The agent can use its experience in each task and across tasks to estimate both the transition model and the distribution over tasks. We propose an algorithm to meta-learn the underlying structure across tasks, utilize it to plan in each task, and upper-bound the regret of the planning loss. Our bound suggests that the average regret over tasks decreases as the number of tasks increases and as the tasks are more similar. In the classical single-task setting, it is known that the planning horizon should depend on the estimated model's accuracy, that is, on the number of samples within task. We generalize this finding to meta-RL and study this dependence of planning horizons on the number of tasks. Based on our theoretical findings, we derive heuristics for selecting slowly increasing discount factors, and we validate its significance empirically.
translated by 谷歌翻译
持续学习领域(CL)寻求开发通过与非静止环境的交互累积随时间累积知识和技能的算法。在实践中,存在一种夸张的评估程序和算法解决方案(方法),每个潜在的潜在不相交的假设集。这种品种使得在CL困难中进行了衡量进展。我们提出了一种设置的分类,其中每个设置被描述为一组假设。从这个视图中出现了一棵树形的层次结构,更多的一般环境成为具有更严格假设的人的父母。这使得可以使用继承来共享和重用研究,因为开发给定设置的方法也使其直接适用于其任何孩子。我们将此想法实例化为名为SequoIa的公开软件框架,其特征来自持续监督学习(CSL)和持续加强学习(CRL)域的各种环境。除了来自外部图书馆的更专业的方法之外,SemoIa还包括一种易于延伸和定制的不断增长的方法。我们希望这一新的范式及其第一个实施可以帮助统一和加速CL的研究。您可以通过访问github.com/lebrice/squia来帮助我们长大树。
translated by 谷歌翻译